34 research outputs found

    Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search

    Full text link
    Top-k maximum inner product search (MIPS) is a central task in many machine learning applications. This paper extends top-k MIPS with a budgeted setting, that asks for the best approximate top-k MIPS given a limit of B computational operations. We investigate recent advanced sampling algorithms, including wedge and diamond sampling to solve it. Though the design of these sampling schemes naturally supports budgeted top-k MIPS, they suffer from the linear cost from scanning all data points to retrieve top-k results and the performance degradation for handling negative inputs. This paper makes two main contributions. First, we show that diamond sampling is essentially a combination between wedge sampling and basic sampling for top-k MIPS. Our theoretical analysis and empirical evaluation show that wedge is competitive (often superior) to diamond on approximating top-k MIPS regarding both efficiency and accuracy. Second, we propose a series of algorithmic engineering techniques to deploy wedge sampling on budgeted top-k MIPS. Our novel deterministic wedge-based algorithm runs significantly faster than the state-of-the-art methods for budgeted and exact top-k MIPS while maintaining the top-5 precision at least 80% on standard recommender system data sets.Comment: ECML-PKDD 202

    Second Order PAC-Bayesian Bounds for the Weighted Majority Vote

    Full text link
    We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble

    TRY plant trait database - enhanced coverage and open access

    Get PDF
    Plant traits-the morphological, anatomical, physiological, biochemical and phenological characteristics of plants-determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait-based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits-almost complete coverage for 'plant growth form'. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait-environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives

    Global delivery models: the role of talent, speed and time zones in the global outsourcing industry

    Get PDF
    Global delivery models (GDMs) are transforming the global IT and business process outsourcing industry. GDMs are a new form of client-specific investment promoting service integration with clients by combining client proximity with time-zone spread for 24/7 service operations. We investigate antecedents and contingencies of setting up GDM structures. Based on comprehensive data we show that providers are likely to establish GDM location configurations when clients value access to globally distributed talent and speed of service delivery, in particular when services are highly commoditized. Findings imply that coordination across time zones increasingly affects international operations in business-to-business and born-global industries

    Species-specific responses of Late Quaternary megafauna to climate and humans

    Get PDF
    Despite decades of research, the roles of climate and humans in driving the dramatic extinctions of large-bodied mammals during the Late Quaternary remain contentious. We use ancient DNA, species distribution models and the human fossil record to elucidate how climate and humans shaped the demographic history of woolly rhinoceros, woolly mammoth, wild horse, reindeer, bison and musk ox. We show that climate has been a major driver of population change over the past 50,000 years. However, each species responds differently to the effects of climatic shifts, habitat redistribution and human encroachment. Although climate change alone can explain the extinction of some species, such as Eurasian musk ox and woolly rhinoceros, a combination of climatic and anthropogenic effects appears to be responsible for the extinction of others, including Eurasian steppe bison and wild horse. We find no genetic signature or any distinctive range dynamics distinguishing extinct from surviving species, underscoring the challenges associated with predicting future responses of extant mammals to climate and human-mediated habitat change.This paper is in the memory of our friend and colleague Dr. Andrei Sher, who was a major contributor of this study. Dr Sher died unexpectedly, but his major contributions to the field of Quaternary science will be remembered and appreciated for many years to come. We are grateful to Dr. Adrian Lister and Dr. Tony Stuart for guides and discussions. Thanks to Tina B. Brandt, Dr. Bryan Hockett and Alice Telka for laboratory help and samples and to L. Malik R. Thrane for his work on the megafauna locality database. Data taken from the Stage 3 project was partly funded by Grant #F/757/A from the Leverhulme Trust, together with a grant from the McDonald Grants and Awards Fund. We acknowledge the Danish National Research Foundation, the Lundbeck Foundation, the Danish Council for Independent Research and the US National Science Foundation for financial suppor

    Diversity Promotes Temporal Stability across Levels of Ecosystem Organization in Experimental Grasslands

    Get PDF
    The diversity–stability hypothesis states that current losses of biodiversity can impair the ability of an ecosystem to dampen the effect of environmental perturbations on its functioning. Using data from a long-term and comprehensive biodiversity experiment, we quantified the temporal stability of 42 variables characterizing twelve ecological functions in managed grassland plots varying in plant species richness. We demonstrate that diversity increases stability i) across trophic levels (producer, consumer), ii) at both the system (community, ecosystem) and the component levels (population, functional group, phylogenetic clade), and iii) primarily for aboveground rather than belowground processes. Temporal synchronization across studied variables was mostly unaffected with increasing species richness. This study provides the strongest empirical support so far that diversity promotes stability across different ecological functions and levels of ecosystem organization in grasslands

    Massive X-ray screening reveals two allosteric drug binding sites of SARS-CoV-2 main protease

    Get PDF
    The coronavirus disease (COVID-19) caused by SARS-CoV-2 is creating tremendous health problems and economical challenges for mankind. To date, no effective drug is available to directly treat the disease and prevent virus spreading. In a search for a drug against COVID-19, we have performed a massive X-ray crystallographic screen of repurposing drug libraries containing 5953 individual compounds against the SARS-CoV-2 main protease (Mpro), which is a potent drug target as it is essential for the virus replication. In contrast to commonly applied X-ray fragment screening experiments with molecules of low complexity, our screen tested already approved drugs and drugs in clinical trials. From the three-dimensional protein structures, we identified 37 compounds binding to Mpro. In subsequent cell-based viral reduction assays, one peptidomimetic and five non-peptidic compounds showed antiviral activity at non-toxic concentrations. Interestingly, two compounds bind outside the active site to the native dimer interface in close proximity to the S1 binding pocket. Another compound binds in a cleft between the catalytic and dimerization domain of Mpro. Neither binding site is related to the enzymatic active site and both represent attractive targets for drug development against SARS-CoV-2. This X-ray screening approach thus has the potential to help deliver an approved drug on an accelerated time-scale for this and future pandemics

    X-ray screening identifies active site and allosteric inhibitors of SARS-CoV-2 main protease

    Get PDF
    The coronavirus disease (COVID-19) caused by SARS-CoV-2 is creating tremendous human suffering. To date, no effective drug is available to directly treat the disease. In a search for a drug against COVID-19, we have performed a high-throughput X-ray crystallographic screen of two repurposing drug libraries against the SARS-CoV-2 main protease (M^(pro)), which is essential for viral replication. In contrast to commonly applied X-ray fragment screening experiments with molecules of low complexity, our screen tested already approved drugs and drugs in clinical trials. From the three-dimensional protein structures, we identified 37 compounds that bind to M^(pro). In subsequent cell-based viral reduction assays, one peptidomimetic and six non-peptidic compounds showed antiviral activity at non-toxic concentrations. We identified two allosteric binding sites representing attractive targets for drug development against SARS-CoV-2

    TRY plant trait database - enhanced coverage and open access

    Get PDF
    This article has 730 authors, of which I have only listed the lead author and myself as a representative of University of HelsinkiPlant traits-the morphological, anatomical, physiological, biochemical and phenological characteristics of plants-determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait-based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits-almost complete coverage for 'plant growth form'. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait-environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives.Peer reviewe

    TRY plant trait database - enhanced coverage and open access

    Get PDF
    Plant traits—the morphological, anatomical, physiological, biochemical and phenological characteristics of plants—determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait‐based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits—almost complete coverage for ‘plant growth form’. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait–environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives
    corecore